| Date | Time | Activity |
|---|---|---|
| Mon 03/11/2024 | 10:00 to 16:00 | Unix-like systems, bash, connecting to SCWales |
| Tue 04/11/2024 | 10:00 to 16:00 | More bash, SCWales, using SCWales and slurm |
| Wed 05/11/2024 | 10:00 to 16:00 | Illumina data, BEARCAVE, data processing |
| Thu 06/11/2024 | 10:00 to 16:00 | ANGSD, covariance and distance matrices, heterozygosity, intro to R |
| Fri 07/11/2024 | 10:00 to 16:00 | Maps, PCA’s, NJ trees, Manhattan plots and Rmarkdown |
https://drabarlow.github.io/bioinformatics_bootcamp/
https://drabarlow.github.io/bioinformatics_bootcamp/bootcamp_worksheet_2025.html
https://github.com/drabarlow/bioinformatics_bootcamp
bash and R.bash and R.sudo)Mac OS
Linux
R typically via Rstudio)bashDOS and Unix not yet possible| Windows | Mac | Linux | |
|---|---|---|---|
| standard PC functions | yes | yes | yes |
| cost | yes | yes | free |
| hardware choice | yes | no | yes |
| bioinformatics | no | yes | yes |
| HPC | no | no | yes |
| open source | no | no | yes |
| active community | no | no | yes |
| games | yes | no | no |
sh), developed by Steven Bourne in 1979bash)bash or something like it
ssh)scp or sftpslurm job schedulermodulesConnecting to the jump host (with MFA)
ssh you25usr@ssh.bangor.ac.uk
Note: most UNIX systems do not show anything when you’re typing your password!
If successful, connecting to Hawk
ssh b.you25usr@hawklogin.cf.ac.uk
Raise your hand if you are having issues 🙌
AI tools are becoming increasingly common in bioinformatics (and everywhere else)
It’s incredible powerful, but keep various considerations in mind (ethics, reproducibility, accuracy, privacy…)
Different models are better or worse for different tasks (e.g. I find ChatGPT 4 is not great at programming)
/ [root] is uppermost level of filesystem/working directory/home/b.xlb21brx/ /scratch/b.xlb21brx/
slurm| Platform | Million reads | Read length | Gb data | Genome coverage |
|---|---|---|---|---|
| iSeq | 4 | 2 x 150 bp | 1.2 | 0.4x |
| MiniSeq | 25 | 2 x 150 bp | 7.5 | 2.5x |
| MiSeq | 100 | 2 x 500 bp | 30 | 10x |
| Nextseq 550 | 400 | 2 x 150 bp | 120 | 40x |
| NextSeq 1000/2000 | 1800 | 2 x 300 bp | 540 | 180x |
| NovaSeq 6000 | 20000 | 2 x 250 bp | 3000 | 1000x |
| NovaSeq X | 52000 | 2 x 150 bp | 8000 | 2667x |
*Indexes allow multiple samples to be sequenced at the same time
[Not an exhaustive list]
Short reads from a single individual can be mapped to a reference genome assembly
sample|locality | adder01-04|Dublin adder05-08|Belfast adder09-12|Cork adder13-16|Limerick adder17-20|Galway adder21-24|Dundalk adder25-27|Bray adder28|outgroup
@A00551:758:HKTVJDSX7:4:1101:3595:6872 1:N:0:CCTGAGATGT+GGTCTAGTTG CTGAATATGGATTTTAATTGAATCCTAAGATATTATAGCATCTTTCACTCCCTGTCCTGTGCATGTCAGA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
45 ka cave bear (Ursus kudarensis)
Cutadapt
FLASHExpected output in /BEARCAVE2/trimdata/*processing/
*_mappable.fastq.gz [big file]*_mappable_R1.fastq.gz [big file]*_mappable_R2.fastq.gz [big file]*_trim_report.log and merge report *_merge_report.logbwa mem algorithmsamtoolssamtoolsExpected output in /BEARCAVE2/mapped*/*processing/
*.bam [big file]*.bam.bai*_mapping.logplink, admixtools, etc)Allele1|Allele2|prob11|prob12|prob22 |||| A|T|0.05|0.9|0.05
NGSadmixPCangsdNGSrelaterealSFSCovariance matrix
Distance matrix
Heterozygosity
realSFSRRstudioR worksRstudioRtidyverseR markdowngit) and other development toolsRstudioTidyverseggplot2tibbletidyrreadrdplyrstringrpurrforcatsR from the hereticsMost people disagree (in some cases strongly)
R is really good!tidyverse is not the way to teach R to beginnersggplot2 code is restrictiveObjects
<-Functions
function()?functionVector
c()my_vector[]Matrix
my_matrix[row, column]Dataframe
$, which can then be indexed like vectorsList
$rworldmap and sfeigen()ape librarySee you next year :)